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Abstract. In the recent research of data mining, frequent structures in a sequence of graphs 
have been studied intensively, and one of the main concern is changing structures along a 
sequence of graphs that can capture dynamic properties of data. On the contrary, we newly 
focus on "preserving structures" in a graph sequence that satisfy a given property for a certain 
period, and mining such structures is studied. As for an onset, we bring up two structures, a 
connected vertex subset and a clique that exist for a certain period. We consider the problem 
of enumerating these structures, and present polynomial delay algorithms for the problems. 
Their running time may depend on the size of the representation, however, if each edge has 
at most one time interval in the representation, the running time is OdV^ljiSp) for connected 
vertex subsets and 0(min{Zi^, for cliques, where the input graph is G = {V,E) with 

maximum degree A. To the best of our knowledge, this is the first approach to the treatment 
of this notion, namely, preserving structures. 

1 Introduction 

In a computerized society and in the era of explosive growth in data volumes, nobody can 
doubt the importance of data mining, that is, extracting useful information (knowledge) 
from a huge data repository. A classic research of data mining, for example, is finding 
association rules from a relational database [1]. We can classify raw data by its type, e.g., 
numerical data, relational data, structured data, and so on. Among these types, data that 
has a certain kind of graph structure (graph structured data) has become important, since 
it can represent a variety of complex objects that appear in practical applications such 
as genome interactions, chemical compounds, hyperlinks on the Web, and XML (so-called 
semi-structured data) . 

Extracting useful facts from graph structured data is often achieved by specifying and/or 
finding frequent substructures in a graph. In other words, pattern mining in graphs (or graph 
mining) [2, 13, 27]. In the case of hyperlink structure of the Web (namely, the webgraph), for 
example, a clique is considered to be formed by a community focused on a certain topic, and 
finding it may be useful for tracing a social phenomenon on the Web [26]. These observa- 
tions imply that one of the most promising approaches for graph mining is by enumeration^ 
and efficient enumeration of crucial substructures has a rich history. As for cliques, a theo- 
retically efficient algorithm is presented in [18], and both [18] and [25] are state-of-the-art 
algorithms that performs well in practice. Enumerations of paths and matchings are stud- 
ied in [20] and [9], respectively, and enumeration of connected components is studied in [3]. 
Here, we remark that all these algorithms work on a single graph. 

In a practical situation, however, it is often the case that graph structures may change 
over time, and such data is collected periodically along a time series. In this setting, not only 
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information acquired separately from single graphs but also from graph patterns appearing 
sequentially could be important. Along this direction, there are some research topics of 
interest so far. Finding graph patterns that appear periodically in a graph sequence is 
studied in [11, 16]. Graph patterns frequently appear during a certain period are also studied 
in [6]. On the other hand, some research address the change patterns that appear frequently 
in a graph sequence composed of graphs with edge insertions/deletions, such as changes 
between two time periods [4] and changes of subsequences [12]. Furthermore, there are 
several studies focusing on clustering of vertices by utilizing graph sequences [22-24]. We 
mention here that to achieve these objectives, enumeration is again a powerful tool that no 
other approaches can match. 

Objective. Taking these preceding research, we propose a new concept of graph mining, 
that is, finding a part of a graph that satisfies a given property continuously for a long time. 
More specifically, we consider the problem of enumerating all substructures that satisfy a 
given property during a prescribed period, i.e., those appearing in a consecutive subsequence 
of a graph sequence. We name such structures preserving structures in a graph sequence, 
and the problem for enumerating all such structures preserving structure mining in general. 
As for such properties, we consider connected vertex subsets and cliques, in this paper. 
For example, a topic on the Web that is controversial for a long time may correspond 
to a clique that exists in a consecutive sequence of webgraphs during a certain period. As 
another example, a group of a species in a wildlife environment may constitute a consecutive 
sequence of connected vertex subsets in a sequence of graphs that are constructed from its 
trajectory data [14, 17]. To the best of our knowledge, this study is the first case in which 
a "long-lasting" structure is regarded as the target structure to be found. 

Related works. (1) Pattern mining in graph sequences. This is already explained just 
before the objective. 

(2) Dynamic fiow. On a dynamic network defined by a graph with capacities and transit 
times along its edges, the dynamic fiow problem asks the maximum fiow from a specified 
source to a sink within a given time bound [8]. As explained later, our model for a graph 
sequence can be naturally generalized so that it implies dynamic fiows. 

(3) Dynamic graph algorithms. Dynamic graph problems concern with a construction 
of data structures that enables to answer a given graph property quickly, with small update 
cost for edge insertions/deletions. Typical properties of concern include connectivity [7, 
10], transitive closures [15], cliques [21], bipartiteness, shortest path distance, and so on. 
Dynamic graph algorithms could also find a period during which a property is satisfied, 
however, it cannot extract local structures efficiently in a straightforward way. For example, 
we have to find time periods for all possible local structures, which may cause exponentially 
long time for computation. 

Contributions. In this paper, we propose a new notion, that is, a preserving structure in 
a graph sequence. Then by adopting this newly introduced notion, we pose two problems 
of mining preserving structures: one for cliques and the other for connected vertex subsets. 
As we have seen so far, both structure or property will play important roles in a sequence 
of graphs that appear in practical situations. 

We then propose efficient algorithms for solving the problems by enumerating all vertex 
subsets that are connected or cliques for a certain time period in a given graph sequence. 
For this purpose, we define a way of representing a graph sequence as the input format. In 



this model, instead of representing a graph at each time by the difference from the previous 
one, which is used in the dynamic graph model, we represent a graph sequence by explicitly 
associating each edge with its time interval(s) during which it exists. Our model is novel 
and differs from the existing ones (e.g., [5]) in the sense that it gives a new perspectives for 
graphs that change over time. This graph model introduces a new parameter, namely, the 
number of time intervals. Since the running time of these algorithms could be estimated 
by using this parameter, we also consider that it would be used as a new measure in the 
complexity study. 

Our algorithm for enumeration of preserving connected vertex subsets is based on a 
recursive graph partition and of preserving cliques is based on the reverse search, which is a 
framework for designing efficient enumeration algorithms. While a straightforward applica- 
tion of maximal clique enumeration to our problem requires exponential time, our algorithm 
exploits properties of the time intervals of edges so that the algorithm will be polynomial 
delay. Compared to a naive algorithm, this reduces the time complexity with a factor of 
\E\. 

Organization of this paper. We first give definitions and representations of graph se- 
quences and preserving structures together with basic terminology in Section 2. In Section 3, 
we deal with the enumeration problem of preserving connected vertex subsets Then we dis- 
cuss about the closed active clique enumeration problem in Section 4. We conclude this 
paper in Section 5. 

2 Preliminaries 

2.1 A Graph Sequence and its Representation 

A graph G is an ordered pair of a vertex set V and an edge set E, and is denoted by 
G = {V,E). We suppose that a vertex set V is {!,..., n} so that each vertex has an 
index and can be treated as an integer. The neighborhood of a vertex v G V is the set 
N{v) = {u & V \ {u,v} € E}. The degree of a vertex v is |iV(?;)|, and is denoted by deg{v). 
we use A to denote the maximum degree of a graph. For a vertex subset U (C V), the 
induced subgraph G[U] oi G hy U is the subgraph whose vertex set is U and edge set is 
composed of all edges in E that connect vertices in U. For an edge set F, let V{F) denote 
the set of vertices that are endpoints of some edges in F. Then for an edge subset F {'^ E), 
we define the induced subgraph G[F] of G by F by the subgraph G[V{F)]. 

A time stamp is an integer representing a discrete time, and we denote by T the ground 
set of all possible time stamps during which our graph is supposed to exist. We assume 
T = {!,... ,tmax} without loss of generality, and a subset T of T is called a time stamp 
set. We say that an edge of a graph is active at time stamp t if it exists at that moment. 
The edge set E of a supposed graph consists only of edges that are active at some time 
stamps. To represent a graph sequence, we associate a time stamp set with each edge on 
which it is active, which we call an active time stamp set of that edge. We regard the active 
time stamp set of edges as a mapping t : E ^ 2^ , and thus we define a graph sequence 
as a pair of a graph G = {V,E) and a mapping r, that is, {G,t). Then the active time 
stamp set of an edge e is T(e), and we define the active time stamp set of an edge set F 
to be t{F) = f]^^pT{e). Given a graph sequence (G, r), we define a closure graph Gt of G 
for a time stamp set T (C F) as the spanning subgraph in which its edge set consists of 



edges whose active time stamp sets includes T, that is, Gt = {V, {e \ e € E,T C r(e)}). 
Especially in case of T = {t}, a singleton, we sometimes denote the closure graph for T by 
Gt by convention. Intuitively, Gt represents a snapshot of G at time stamp t. By definition, 
Gt becomes G if T = 0. 

A time stamp set is (time) interval if it constitutes a single interval {t,t + 1, . . . ,t + £} 
{i > 0). In this paper, it is sometimes assumed that the active time stamp set of any edge 
is interval, and we call this an interval assumption. Note that we can assume this without 
loss of generality, since if an active time stamp set of an edge is composed of multiple time 
intervals, we can replace it by a set of parallel edges (multi-edge) each of which has one 
of their intervals, respectively. Unlike the existing ones, this way of representing a graph 
sequence has an advantage in its extendability. As for a natural extension, we consider 
edges connecting a vertex with a past time stamp to one with a future time stamp. In this 
case, an edge could be represented together with its time interval by a tuple of five values 
{u,v,tu,ty,£), that is, vertices u and v are adjacent by an edge from time tu and t^ until 
tu + i and tv + i, respectively. By regarding the pair of a vertex and a time stamp as a kind 
of super- vertex, it can be seen as a set of parallel edges, thus we call this extension a thick 
edge graph. 

Although this thick edge graph model might seem unusual, it has several natural ap- 
plications. One example is a similarity graph on sequential data. We regard a pair of a 
sequence and a time stamp as a vertex. We draw an edge between two vertices when their 
corresponding sequences at the corresponding time stamps are similar. In a sequential data, 
two subsequences are often similar in consecutive time intervals, thus thick edges can rep- 
resent the data in a compact way. Another example is a dynamic flow, a dynamic version 
of a network flow. In a dynamic flow, when a flow departs a vertex at time t along an edge 
e, it arrives at the other end vertex of e at time t + i{e), where i{e) is the length of e. Each 
edge has its own capacity c(e), and thus pushing flow of / units along e takes //c(e) time. 
Therefore, a flow along an edge is equivalent to a thick edge. Thus, preserving structures 
in a thick edge graph correspond to those composed of thick edges that share vertices for 
sufficiently long time periods. 

2.2 Preserving Structures 

Let {G, t) be a graph sequence, where G = {V, E) and t : E 2^ with a ground time stamp 
set T. We next consider preserving structures in a graph sequence, that is, a subgraph that 
consecutively satisfies certain properties, such as connected subgraphs and cliques in this 
paper. Especially, we will be interested in maximal one of those in some sense. We note 
that the term "closed" appearing below is employed from the pattern mining field [19]; a 
closed pattern is a maximal pattern that is not included in the other patterns with the 
same frequency. 

A vertex subset U is connected if there exists a path between any two vertices of U. In 
this case we also say that G[U] is connected. A vertex subset U is said to be connected on 
a time stamp set T if [/ is connected at any time stamp in T, and let 7(C/) be the set of 
time stamps at which U is connected. We say that a connected vertex subset U is closed if 
none of its superset U' satisfies ^{U) = 7 ([/'). 

A clique is a complete subgraph of a graph. In this paper, we define a clique by its edge 
set, and thus we do not regard a single vertex as a clique. A clique is called maximal if none 



of its superset becomes a clique. An edge set F is called active if t{F) ^ 0, and t[F) equals 
T if -F = 0. An active clique K m. a. graph sequence is closed if no other clique K' such that 
K CK' satisfies t{K) = t{K'). 

3 Enumeration of Preserving Connected Components 

In this section we study the closed connected vertex subsets in a graph sequence (G, r), 
where G = {V^ E) and t : E ^ 2^ with a ground time stamp set T. We start by observing 
some properties on closed connected vertex subsets, and then present how they can be 
enumerated. 

We first have the following simple observations. 

Property 1 (closed under union). For two vertex subsets U and U' , if both U and U' are 
connected on a time stamp set T and U DU' ^ ^, then U UU' is also connected on T. 

For two partitions V and V' of a universal set, let V AV' denote the partition composed 
of subsets given by the intersection of members of V and V , i.e., V A V' = {I \ I = 
H D H',H € V,H' G V'}. A connected component of G is a maximal vertex subset U such 
that G[U] is connected. The set of connected components of G gives a partition of the 
vertex set, and we denoted it by C{G). For a time stamp set T = {tj^, . . . let V{G,T) 

denote Ai=iC(Gt. .), which forms a partition of V. 

Property 2 (partition). A connected vertex subset [/ on a time stamp set T is included in 
one of a member (vertex subset) of V{G,T). 

Property 3 (subdivision). A connected vertex subset [/ on a time stamp set T, where U is 
included in a vertex set W ., is included in a vertex subset of 'P{G\W\,T). 

We denote the family of all maximal connected vertex subsets of G on a time stamp set 
T by C(G, T). Property 1 ensures that C(G, T) becomes a partition of V . In the subsequent 
discussions in this subsection, suppose that a time stamp set T is interval, and let ^ 
denote an interval time stamp set Tt^i = {t, t + 1, . . . , t + ^}. In addition, we assume for 
simplicity that both ends of any interval time stamp set £ can be examined in 0(1) time 
by appropriate pre-process and data structures. 

Then we have the following two lemmas. 

Lemma 1. For an interval time stamp set Tt^ with a fixed time stamp t, C{G,Tt^e) for all 
£ (> 0) can be computed in 0{\V\\E\'^) time. 

Proof. We first compute C{G,Ttfi) = C{Gt), which is simply a family of connected compo- 
nents of Gt, in 0{\E\) time, and then compute each C{G,Tt^i) from C{G,Tt^i-i). Suppose 
that we have computed C(G, Tt^j-i). If [/ G C(G, Tt^i-i) is connected on Gt+i, U is connected 
in Tf^i, thereby a member of C{G,Tt^i). If not, from Properties 2 and 3, any U' € C{G,Tt^i) 
for [/' C [/ is included in C{G[U], {t + i}). According to Property 3, we recursively compute 
V{G[U'],Tt^i) for all members U' of C{G[U], {t + i}), and repeat this until U' becomes con- 
nected on Tt^i. In this way, we can compute all members of C{G[U],Tt^i). The time complexity 
of this computation is Odi^l) for checking the connectivity of each U G C(G, Tt^i-i) at time 
stamp t + i, and 0{p\E\i) time for the computation of V, where p = |C(G, Tj)| — |C(G, Tj_i)|. 



Now, without loss of generality, since any time stamp appears as either a starting or an 
ending time stamp of an edge, we have I = 0(|i?|). Thus, in total, the computation for all i 
(0 < i < takes 0(|£^p) time for the former, and 0(|y||ii^p) time for the latter. Therefore 
the statement holds. □ 

Lemma 2. Any member U in C{G,T) is a closed connected vertex subset of G on an 
interval time stamp set T. 

Proof. From the way of a construction of C{G,T), no superset of a member of C{G,T) is 
connected on T. It implies that for each U G C{G,T), no superset of U is connected in 
7(C/). This concludes the lemma. □ 

Lemma 2 motivates us to compute C{G, T) for all possible interval time stamp set T to 
enumerate all closed connected vertex subsets. For each time stamp t, we compute C{G, Tt/) 
for interval time stamp set T = {t, t + 1, . . . , t + £} for all possible £. From Lemma 1, this 
computation can be done in Odyjli^j^) time. Thus we obtain the following theorem, where 
we use again the fact that H. = 0{\E\). 

Theorem 1. In a graph sequence {G,t), all closed connected vertex subsets can be enu- 
merated in 0{\V\\E\^) time. □ 



The correctness of this algorithm relies only on the above three properties, therefore 
the algorithm can be applied to similar connectivity conditions satisfying these properties, 
such as strong connectivity of a directed graph and two-edge connectivity of a graph. 

Theorem 2. In a graph sequence {G,t) in which G is a directed graph, all closed strongly 
connected vertex subsets can be enumerated in 0{\V\\E\'^) time. □ 

Theorem 3. In a graph sequence {G,t), all closed two-edge connected vertex subsets in a 
graph can be enumerated in 0{\V\\E\^) time. □ 

In the case of two- vertex connectivity. Property 1 holds only when the intersection size 
of two components is no less than two. Thus, C{G, T) could not be a partition of a vertex set. 
Instead of a vertex set, we represent a connected vertex subset by all vertex pairs included 
in the subset. Using this representation, when two subsets share at most one vertex, the 
intersection of their representations is the empty set. Obviously this representation satisfies 
the other two properties, thus we have the following theorem. 

Theorem 4. In a graph sequence {G,t), all closed two-vertex connected vertex subsets can 
be enumerated in 0{\V\'^\E\'^) time. □ 



4 Enumeration of Closed Active Cliques 



This section discusses about the enumeration of all closed active cliques in a graph sequence 
(G, r). We first give some additional definitions for further arguments and observe some 
basic properties of closed active cliques. After that we state a simple output polynomial 
time algorithm as a warm-up, and then we present a more efficient algorithm based on the 
reverse search whose time complexity is much smaller than the simple algorithm. 



For a time stamp set T, let Nx{v) = {w \ w € N{v),T C t{{v,w})} and Nj'{F) = 
r\veV{F) -^t(w) for an edge set F, that is, Nt{v) is the set of vertices adjacent to v at all 
time stamps in T and Nt{F) is the set of vertices adjacent to all vertices in V{F) at any 
time stamp in T. For an edge set F and a vertex set U, F\U denotes the edge set obtained 
from F by removing all edges incident to some vertices in U, and FnU denotes F\{V\U). 
For an edge set F and a vertex v, let M{F, v) denote the set of edges connecting v and a 
vertex in V{F). Let r{F) be the set of vertices v such that t{F) C t{M(F,v)). 

Now let F<i be the edge set obtained from F by removing edges incident to vertices 
whose index is greater than i. By definition, F<i is empty if z < 1, and is F if i > n. A 
lexicographic order on a family of sets is a total order defined in such a way that a set F 
is smaller than when the smallest element in their symmetric difference FAF' belongs 
to F. For an active clique i^T in a graph sequence, let X(K) denote the lexicographically 
smallest closed clique including K among all closed cliques K' such that t{K') = t{K). 

4.1 A Simple Algorithm 

Let {G, r) be a graph sequence, where G = (V, E) and t : E ^ 2^ with a ground time 
stamp set T, We first observe a few basic properties of closed active cliques in a graph 
sequence. Remember that a clique is defined by an edge set in this paper. 

Lemma 3. For any active clique K, X{K) can he computed in 0(min{|i?|, Zl^}) time. 

Proof. We can obtain X{K) by iteratively choosing the minimum vertex v in Nt-(^j^^{K) 
and adding edges of M{K,v) to K, until Nt-^j^-^{K) = 0. N^(^j^^{K) can be computed in 
©(minlli?!, Z^^}) time by scanning all edges adjacent to some edges in K. When we add 
N^(^x){K) to K., N^(^x){K U N^(^x){K)) can be computed in 0(deg(7;)) time by checking 
whether t{K) C t[{u,v}) or not for each u € N^(^j^^{K). Therefore the statement holds. 

□ 

Lemma 4. For any time stamp set T, any maximal clique K in Gt is closed. 

Proof. If K is not closed, G^(^x) includes a clique K' such that K <Z K' . Since T C t{K), 
T T{e) holds for any edge e (z K' . This implies that K' is a clique in Gt, which contradicts 
the assumption. □ 

Conversely, it is easy to see that any closed active clique K is a maximal clique in the 
graph G.^(^xy This motivates us to compute all maximal cliques in all closure graphs of 
possible active time stamp sets for enumerating all closed active cliques. 

Lemma 5. All closed active cliques can he enumerated in 0(\V\\E\^) time for each, under 
the interval assumption. 

Proof. Under the interval assumption, the active time stamp set of any closed active clique 
is also an interval. These active time stamp sets satisfy that the both ends of the interval 
are given by the active time sets of some edges, thus their number is bounded by \E\^. Let 
K, be the family of cliques each of which is a maximal clique in a closure graph of some of 
those active time stamp sets. Then, from Lemma 4, we can see that |/C| is bounded by the 
product of l-Ep and the number of closed active cliques. By using the algorithm in [18], the 



maximal cliques can be enumerated in 0(|1^| + \E\) time for each, and thus the maximal 
cliques in /C can be enumerated in 0((|y| + |£^|)|/C|) time. To check whether an enumerated 
clique K is closed or not, we compute X{K) in 0(|y| + \E\) time. Since a closed active 
clique can be a maximal clique of Gt for at most \E\'^ time stamp sets T, the closed active 
cliques can be enumerated in OdFllE'p) time for each. □ 

4.2 An Efficient Algorithm based on the Reverse Search 

The reverse search is a scheme for constructing enumeration algorithms, and was originally 
proposed by Avis and Fukuda [3] for some problems such as enumeration of vertices of 
a polytope. The key idea of the reverse search is to define an acyclic relation among the 
objects including the ones to be enumerated. An acyclic relation induces a tree, which 
results in the so-called a parent-child relation, and we call the tree a family tree. Hence 
enumerating objects is realized by traversing the tree according to the parent-child relation 
to visit all the objects. In fact, the reverse search algorithm performs a depth-first search 
on the tree induced by the parent-child relation, and is implemented by a procedure for 
enumerating all children of a given object. It starts from the root object that has no parent 
and enumerates its children, and then it recursively enumerates children for each child. 

It is easy to see the correctness of the algorithm; that is, the tree induced by the parent- 
child relation spans all the objects, and the algorithm visits all the vertices of the tree 
by a depth-first search. When a procedure for enumerating children takes at most 0{A) 
time for each child, the computation time of the reverse search algorithm is bounded by 
0{AN), where N is the number of objects to be enumerated. Hence, if A is polynomial in 
terms of the input size, the entire reverse search algorithm takes output polynomial time. 
In the following, we carefully observe the properties of a graph sequence, and prove that 
enumeration of children can be done in polynomial time. 

Now a more efficient algorithm for enumeration of closed active cliques can be designed 
based on the reverse search. We start with giving some definitions and fundamental obser- 
vations. The scheme of the reverse search has already been applied to enumeration maximal 
cliques [18], and our algorithm for closed active cliques adopts their ideas. For an active 
clique K, let i{K) be the minimum vertex i satisfying X{K<:i) = K. We define the parent 
P{K) of closed active clique K by A(i^<j(/^)_i), and P{K) is not defined for K = A(0), 
which is called the root of the family tree. 

Lemma 6. The parent-child relation defined by P is acyclic. 

Proof. Suppose that X is a closed active clique such that P{K) is defined. P{K) is generated 
by removing vertices one by one from and adding vertices so that the active time set 
does not change, thus t{P{K)) always includes t{K). Since X{K.ci{K)-i) P{K) is 

lexicographically smaller than K when t{P{K)) = t{K). Thus, either (a) P{K) has a 
larger active time set than K, or (b) P{K) has the same active time set as K and is 
lexicographically smaller than K. Therefore the statement holds. □ 

Lemma 7. Any vertex in P(i^)<j(^) \ K does not belong to N^(^x)i'>-i^)) > ^''^^ therefore 
K<iiK)-i = P{K)<i^K) n N,^K){i{K)). 



Proof. Suppose that a vertex v in P{K)^i(^j^-^\K belongs to N^(^j^-^{i{K)). Then, X{K^ii^j^^) 
has to include either v or another vertex u < v.lt implies that X{K<^,i(^x))^ {li • • • > K^)} 
^<i{K)j thereby X{K^j(^j^-j) ^ K. This contradicts the definition of i{K). □ 

A subset F of M{K, v) is called time maximal if F is included in no other subset F' 
of M{K,v) satisfying r(F) n t{K) = t{F') D t{K). Let I{K,v) be the set of all time 
maximal subsets of M{K, v). For a time maximal subset F £ I{K, v), we define C{K, F) = 
X{K<^nV{F)UF). 

Lemma 8. If K' is a child of non-root closed active clique K, then K' = C{K,F) holds 
for some vertex v and F € I{K,i(K')). 

Proof Let F = Af(A'<i(;^,),i(A")). FVomLemma?, K'^^^^k')-! = K<i^K')-i^V{N,^K'){iiK'))) 
holds, and thus K = X{K<i(^K')-i n V{F) U F). 

We next show that F is a member of I{K, i(K')). Suppose that K' is a child of K, and F 
does not belong to I{K, i{K')), i.e., F is properly included in an edge subset F' G I{K, i{K')) 
such that t{F) = t[F'). Then, the active time set of K<^i(^x') ^(-^0 is same as that of 
^<i{K') ~ ^<i(K')~i ^ ^(-^) U F. This implies that X(ii'^.^^,-j) includes several edges in 
F' , which contradicts to the definition of i{K'). □ 

Since X[K<_y) ^ K holds for any v < i{K), we have the following corollary. 

Corollary 1. C{K,F) is not a child of K for any F € I{K,v) satisfying v < i{K). 

It is true that any child is C{K,F) for some F. However, C{K,F) cannot always be 
a child, that is, C{K,F) is a child of K if and only if P{K) = P{C{K,F)). This implies 
that we can check whether C{K, F) is a child or not by computing P[K). Therefore, from 
Lemma 8, we obtain the following procedure to enumerate children of K. For avoiding the 
duplicated output of the same child K' , we output K' only when K' is generated from 
F £ I{K,i{K')). 

Procedure EnumChildren(Ar: non-root closed active clique) 

1. for each F G I{K, v), v > i{K) do 

2. compute C(A:,F); 

3. compute i{C{K, F)) and P{C{K, F)); 

4. ii K = P{C{K, F)) and i{C{K, F)) = v then output C{K, F); 

5. end for 

For analyzing the complexity of this procedure, which will later be used as a subrou- 
tine of the entire algorithm for enumerating closed active cliques, we show some technical 
lemmas. 

Lemma 9. P{K) can he computed in 0{\E\) time. 

Proof. Suppose that K is not the root, i.e., P{K) is defined. Let K' be initialized to the 
empty set, and we add vertices of K to K' one by one from the smallest vertices in the 
increasing order. In each addition, we maintain the change of t{K') and N^(^x^{K'). Then, 
we can find the minimum vertex v satisfying T{K<:y) = t{K), and the minimum vertex u 
satisfying i = min{A^^(/^)(i('<j_i)} for any i £ K,i > u. We have i{K) = max{u,v}, since 



X{K<j) 7^ K holds when either t{K) ^ T{K<j) or i ^ mm{N^f^j^^{K<i_i)} holds for some 
i € K,i > j. Under the assumption that both ends of any interval time stamp set can 
be examined in 0(1) time, t{K U {e}) can be computed in 0(1) time from t{K) for any 
edge e. Thus, we can compute i{K) in 0(min{|£'|, A'^}) time. Together with Lemma 3, the 
statement holds. □ 

Lemma 10. If K is not the root, any child K' of K satisfies that K<^i(^x') ^<i{K') ^■ 

Proof. If K^i^^j^i-^ n K'<:ii^x') ~ ^' holds that K'<:i(^K')-i H if = 0. Since always 
included in K, we have ~ ®' Therefore, P{K') = X(0), which implies that P{K) 

is the root. □ 

Lemma 11. If K is not the root, the children of K is enumerated by evaluating at most 
min{Z\|£^|, Z\^} edge sets under the interval assumption. 

Proof. By the interval assumption, the ends of the active time set of any subset F of I{K, v) 
is given by the ends of some edges in F, and thus \I{K,v)\ is bounded from above by 
Lemma 10 ensures that if K is not the root. Step 2 of EnumChildren does not have to 
take care of vertices not adjacent to any vertex of V{K). This means that we have to take 
care only of non-empty maximal subset in I{K,v). Let I be the union of all non-empty 
subsets of I{K,v). Since each edge in F € I{K,v) is incident to some vertices in K, we 
have \I\ < mm{\E\, A'^}. It implies that the number of possible choices of two edges from 
some non-empty I{K, v) is bounded from above by A ■ min{|£'|, A'^}. □ 

By the above lemmas, we can estimate the time complexity of the procedure of enumer- 
ating children. 

Lemma 12. Procedure EnumChildren enumerates all children of K in 0(min{Z\^, |£^pZ\}) 
time under the interval assumption. 

Proof. The correctness of the procedure comes from Lemma 8. We note that the procedure 
never output any child more than once, since each child is generated from its unique parent, 
a maximal subset included in € I{K,i{K)). We then observe that all non-empty subset 
F € I{K,v),v > i{K) can be computed in 0(min{|£^|, Z\^}) time by scanning all edges 
adjacent to some edges in K, and C{K,F) can be computed in 0(min{|£'|, zl^}) time in a 
straightforward manner. From Lemma 11, the procedure iterates the loop for min{Z\|£^|, A^} 
edge sets, and each edge set spends 0(min{|£^|, A^]) time from Lemma 9. Thus, we conclude 
the lemma. □ 

Now we describe our algorithm for enumerating all closed active cliques in a graph 
sequence based on the reverse search as follows. It is presented in a slightly different form by 
introducing a threshold a with respect to the length of active time stamp sets by observing 
that t{K) C t{P{K)) always holds. It enumerates all closed active cliques having active 
time sets larger than a by giving X(0) (thus enumerates all when a is set to be 0). 



Algorithm EnumClosedActiveClique(i^: closed active clique) 

1. output K; prv := nil; 

2. if prv = nil then K' := the first clique found by EnumChildren(i^'); 
else K' := the clique found just after prv by EnumChildren(if); 

3. if there is no such clique K' go to Step 8; 
4:. K := K'; free up the memory for K'; 

5. if |-P(i^)| > cr then call EnumClosedActiveClique(i^); 

6. K := P{K); 

7. go to Step 2; 

8. if K is not the root then return; 

9. for each e E £^ do 

10. if e is lexicographically minimum in X{e) then EnumClosedActiveClique(A(e)); 

11. end for 



Finally, we can establish the following theorem. 

Theorem 5. Under the interval assumption, Algorithm EnumClosedActiveClique enumer- 
ates all closed active cliques in a graph sequence in 0{N m.\n{A^ ^\E'\^ A}) time and in 
0{\V\ + \E\) space, where N is the number of closed active cliques in a graph sequence. 

Proof. The correctness of the algorithm is easy to see from the framework of the reverse 
search and Lemma 6. The computation time of the reverse search is given by the product 
of the number of objects to be enumerated and the computation time on each object. 
From Lemma 12, an iteration requires 0(A^min{Z\^ time for non-root closed active 

cliques. For the root K = A(0), we can enumerate its children K' satisfying the condition of 
Lemma 10 in 0(A^min{Z\^, |£'pZ\}) time using procedure EnumChildren. When /C<j(^/) n 
^'<i{K') ~ have i^<j(-^/-)_;^ PI = 0. This implies that K'^-f^j^,-^ is composed of an 

edge, thus by generating X{{e}) for all e ^ E, we can enumerate the children that do not 
satisfy the condition of Lemma 10, in 0(min{|Sp, time. Note that the duplication 

can be avoided by outputting X{{e}) only when e = argminX({e}). Since N > |£^|/Z\^, 
it holds that min {\E\'^, < Nmm{A^, \E\^A}. Therefore the time complexity of the 

algorithm is as stated. 

In a straightforward implementation of the algorithm, each iteration may take + 
|£^|) space for keeping the intermediate results of the computation in memory, especially for 
all C{K, F). We can reduce this by restarting the iteration from the beginning. When we find 
a child K' of K, we immediately generate the recursive call with K' , before the termination 
of the enumeration of the children. After the termination of the recursive call, we resume 
the enumeration of the children. To save the memory, we restart from the beginning of 
the iteration, and we pass through the children found before K', and reconstruct all the 
necessary variables. We note that the time complexity does not change by the restart, 
since the number of restarts is bounded by the number of recursive calls generated by the 
algorithm. A child is given by a maximal edge subset, and a maximal edge subset is given by 
two edges. Thus, we can memorize a child by a constant number of variables. The clique K 
is constructed by computing P{K'), thus it is also not necessary to have K in memory, and 
can be re-constructed without increasing the time complexity. The iteration with respect 
to the root takes 0(|1^| + \E\) space, therefore we have the atatement of the theorem. □ 

As we stated, since t{K) C t{P{K)) always holds, we have the following corollary. 



Corollary 2. Under the interval assumption, Algorithm EnumClosedActiveClique enumer- 
ates all closed active cliques having active time sets no shorter than a given threshold a in 
0(min{Z\^, lii^pzl}) time for each and in 0{\V\ + \E\) space. □ 

Note again that the interval assumption can be set without loss of generality, since 
we can replace an edge with multiple time intervals by parallel edges having a single time 
interval for each, in their active time stamp sets. However, this transformation increases the 
degrees of the vertices, thus the time complexity may increase. If we set A to the maximum 
degree to the transformed graph, then the results hold. 

4.3 Extension to Thick Edge Graphs 

We consider the extension of our algorithm to "thick edge graphs" . In a thick edge graph, a 
clique is composed of several vertices with different time stamps. Hence, a clique is supposed 
to be "vertex vi at time stamp ti, ... , and vertex Vk at time stamp tj. are fully connected". 
We thus associate a non-negative number shift s{v) for each vertex v to define the active 
time stamp set for vertex sets. For an edge set K and a set S of shifts s{v) for vertices v in 
V{K)= {vi, . . . , Vk}, their active time stamp set is defined by the set of t such that "vertex 
vi at time stamp s{vi) + 1, ... , and vertex v^ at time stamp s{vk) + t form a clique". We 
exclude its ambiguity by setting one of s{v) to 0. 

A closed active clique in a thick edge graph is defined by a pair of an edge set K and 
shifts S such that no clique with the same shift for vertices of V{K) includes K. Once we 
fix shifts of all vertices in the graph, the enumeration of closed active cliques in a thick edge 
graph is equivalent to that in a graph sequence. Although the exhaust search may take 
exponential time, our enumeration algorithm based on the reverse search still works even 
in thick edge graphs. 

First, we define the lexicographic order on the set of pairs of a vertex and its shift, 
i.e., {{vi, s{vi)), {vk, s{vk))}- Then, X{K) and P{K) are defined in the same way as 
on a graph sequence, and their computation can be done in the same time complexity. 
A child is obtained from its parent by adding a vertex w and setting the shift of w, and 
Lemmas 7, 8 and 10 also hold. Since the choice of the shift of w depends on the choice of 
the edge to be added to K, the number of children of a closed active clique is also bounded 
by min{Z\|£'|, Z\^}, which implies that Lemma 11 also holds. Thus, we have the following 
corollary. 

Corollary 3. Under the interval assumption, all closed active cliques in a thick edge graph, 
with active time stamp sets no shorter than a given threshold a can be enumerated in 
0(min{Z\^, |i?pZ\}) time for each within 0{\V\ + \E\) space. □ 

5 Conclusion 

In this paper, we focused on the structures preserved in a sequence of graphs continuously 
for a long time, which we call "preserving structures". We considered two structures, closed 
connected vertex subsets and closed active cliques, and proposed efficient algorithms for 
enumerating these structures preserved during a period no shorter than a prescribed length. 
An interesting future work is to develop efficient algorithms for preserving structure mining 
problems for other graph properties. 
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